On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach
نویسندگان
چکیده
DNA-binding proteins play pivotal roles in alternative splicing, RNA editing, methylating and many other biological functions for both eukaryotic and prokaryotic proteomes. Predicting the functions of these proteins from primary amino acids sequences is becoming one of the major challenges in functional annotations of genomes. Traditional prediction methods often devote themselves to extracting physiochemical features from sequences but ignoring motif information and location information between motifs. Meanwhile, the small scale of data volumes and large noises in training data result in lower accuracy and reliability of predictions. In this paper, we propose a deep learning based method to identify DNA-binding proteins from primary sequences alone. It utilizes two stages of convolutional neutral network to detect the function domains of protein sequences, and the long short-term memory neural network to identify their long term dependencies, an binary cross entropy to evaluate the quality of the neural networks. When the proposed method is tested with a realistic DNA binding protein dataset, it achieves a prediction accuracy of 94.2% at the Matthew's correlation coefficient of 0.961. Compared with the LibSVM on the arabidopsis and yeast datasets via independent tests, the accuracy raises by 9% and 4% respectively. Comparative experiments using different feature extraction methods show that our model performs similar accuracy with the best of others, but its values of sensitivity, specificity and AUC increase by 27.83%, 1.31% and 16.21% respectively. Those results suggest that our method is a promising tool for identifying DNA-binding proteins.
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملPrediction of Iranian EFL Learners’ Learning Approaches Through Their Teachers’ Narrative Intelligence and Teaching Styles: A Structural Equation Modelling Analysis
It goes without saying that there are many influential factors affecting the success of any learning experience, and teachers are definitely among the significant factors influencing the process of teaching and learning. In this respect, the present study sought to investigate the prediction of Iranian English as a Foreign Language (EFL) learners' learning approaches through their teachers’ nar...
متن کاملIn silico investigation of lactoferrin protein characterizations for the prediction of anti-microbial properties
Lactoferrin (Lf) is an iron-binding multi-functional glycoprotein which has numerous physiological functions such as iron transportation, anti-microbial activity and immune response. In this study, different in silico approaches were exploited to investigate Lf protein properties in a number of mammalian species. Results showed that the iron-binding site, DNA and RNA-binding sites, signal pepti...
متن کاملPrediction of 3D protein Structure based on Mutation of AKAP3 and PLOD3 Gene in Case of Non-Obstructive Azoospermia
Background: The present study has been designed with the aim of evaluating A-kinase anchoring proteins 3 (AKAP3)and Procollagen-Lysine, 2-Oxoglutarate 5-Dioxygenase 3 (PLOD3) gene mutations and prediction of 3D proteinstructure for ligand binding activity in the cases of non-obstructive azoospermic male.Materials and Methods: Clinically diagnosed cases of non-obstructive azoos...
متن کاملIn silico identification of epitopes from house cat and dog proteins as peptide immunotherapy candidates based on human leukocyte antigen binding affinity
The objective of this descriptive study was to determine Felis domesticus (cat) and Canis familiaris (dog) protein epitopes that bind strongly to selected HLA class II alleles to identify synthetic vaccine candidate epitopes and to identify individuals/populations who are likely to respond to vaccines. FASTA amino acid sequences of experimentally validated allergenic proteins of house cat and d...
متن کامل